Behaviour-Conditioned Policies for Cooperative Reinforcement Learning Tasks
نویسندگان
چکیده
The cooperation among AI systems, and between systems humans is becoming increasingly important. In various real-world tasks, an agent needs to cooperate with unknown partner types. This requires the assess behaviour of during a cooperative task adjust its own policy support cooperation. Deep reinforcement learning models can be trained deliver required functionality but are known suffer from sample inefficiency slow learning. However, adapting ongoing ability type quickly. We suggest method, where we synthetically produce populations agents different behavioural patterns together ground truth data their behaviour, use this for training meta-learner. additionally architecture, which efficiently generated gain meta-learning capability. When equipped such meta-learner, it capable quickly types in new situations. method used automatically form distribution meta-training emerging behaviours that arise, example, through self-play.
منابع مشابه
Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks
Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobje...
متن کاملReinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing
In this paper, we apply reinforcement learning for automatically learning cooperative persuasive dialogue system policies using framing, the use of emotionally charged statements common in persuasive dialogue between humans. In order to apply reinforcement learning, we describe a method to construct user simulators and reward functions specifically tailored to persuasive dialogue based on a cor...
متن کاملImitative Policies for Reinforcement Learning
We discuss a reinforcement learning framework where learners observe experts interacting with the environment. Our approach is to construct from these observations exploratory policies which favor selection of actions the expert has taken. This imitation strategy can be applied at any stage of learning, and requires neither that information regarding reinforcement be conveyed from the expert to...
متن کاملCooperative Inverse Reinforcement Learning
For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, parti...
متن کاملBehaviour-Based Reinforcement Learning
Although behaviour-based robotics has been successfully used to develop autonomous mobile robots up to a certain point, further progress may require the integration of a learning model into the behaviour-based framework. Reinforcement learning is a natural candidate for this because it seems well suited to the problems faced by autonomous agents. However, previous attempts to use reinforcement ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-86380-7_40